import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
Financial and economic data is mostly displayed and stored as time series. The time series data in the csv file is in long format
# read data
df_prices = pd.read_csv('price_data.csv')
df_prices
| stock | date | price | |
|---|---|---|---|
| 0 | AMZN | 2000-03-07 | 3.175 |
| 1 | AMZN | 2000-01-03 | 4.469 |
| 2 | AMZN | 2000-01-04 | 4.097 |
| 3 | AMZN | 2000-01-05 | 3.488 |
| 4 | AMZN | 2000-01-06 | 3.278 |
| ... | ... | ... | ... |
| 17356 | MSFT | 2022-12-23 | 238.730 |
| 17357 | MSFT | 2022-12-27 | 236.960 |
| 17358 | MSFT | 2022-12-28 | 234.530 |
| 17359 | MSFT | 2022-12-29 | 241.010 |
| 17360 | MSFT | 2022-12-30 | 239.820 |
17361 rows × 3 columns
# data in wide format
print('---> csv data in wide form')
df_prices_wide = df_prices.pivot(index='date', columns='stock', values='price')
df_prices_wide
---> csv data in wide form
| stock | AAPL | AMZN | MSFT |
|---|---|---|---|
| date | |||
| 2000-01-03 | 0.999 | 4.469 | 58.2813 |
| 2000-01-04 | 0.915 | 4.097 | 56.3125 |
| 2000-01-05 | 0.929 | 3.488 | 56.9063 |
| 2000-01-06 | 0.848 | 3.278 | 55.0000 |
| 2000-01-07 | 0.888 | 3.478 | 55.7188 |
| ... | ... | ... | ... |
| 2022-12-23 | 131.860 | 85.250 | 238.7300 |
| 2022-12-27 | 130.030 | 83.040 | 236.9600 |
| 2022-12-28 | 126.040 | 81.820 | 234.5300 |
| 2022-12-29 | 129.610 | 84.180 | 241.0100 |
| 2022-12-30 | 129.930 | 84.000 | 239.8200 |
5787 rows × 3 columns
# the 3 stocks price series diplayed on the same chart
fig1 = px.line(data_frame=df_prices, x='date', y='price', line_group='stock', color='stock', title='Stock Price Series')
fig1.show()
It may be difficult to compare the 3 price series above because the starting price is not the same. To overcome this, calculate the return series $r_t$ for a stock and build a price series from a common base value. The total return for a stock at time $T$ is defined as $$ TR(T) = 100 \times \Big[ \Pi_{i=1}^{t\leq T} \big( 1 + r_{t} \big)\Big]$$ Return at time $t$ is $$ r_t = \frac{p_t - p_{t-1}}{p_{t-1}}$$ Compare the Total Return series for the 3 stocks
# convert price series to return series
df_returns = df_prices_wide.pct_change()
df_returns = df_returns.dropna()
# plot - directly going from wide to long form for the plot
fig2 = px.line(data_frame=df_returns.reset_index().melt(id_vars='date'), x='date', y='value', line_group='stock', color='stock', title='Stock Return Series')
fig2.show()
# calc total return
df_returns = df_returns + 1
df_returns = 100 * df_returns.cumprod()
df_returns
# plot
fig3 = px.line(data_frame=df_returns.reset_index().melt(id_vars='date'), x='date', y='value', line_group='stock', color='stock', title='Stock Total Return Series')
fig3.show()